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Prediction of Ordered Random Effects in a Simple Small Area Model 
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Abstract: Prediction of a vector of ordered parameters or part of it arises naturally 
in the context of Small Area Estimation (SAE). For example, one may want to 
estimate the parameters associated with the top ten areas, the best or worst area, 
or a certain percentile. We use a simple SAE model to show that estimation of 
ordered parameters by the corresponding ordered estimates of each area separately 
does not yield good results with respect to MSE. Shrinkage-type predictors, with 
an appropriate amount of shrinkage for the particular problem of ordered parame- 
ters, are considerably better, and their performance is close to that of the optimal 
predictors, which cannot in general be computed explicitly. 
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Linear predictor. 

1 Introduction 

We study the prediction of ordered random effects in a simple model, motivated 
by Small Area Estimation (SAE), under a quadratic loss function. The model is 

yi=fi + Ui + ei, i = l,...,m, (1.1) 

where yi is observed, fj, is an unknown constant, Ui *~ F{0, cr^) and Cj ~ G{0, 0"g), 
and F and G are general distributions with zero means and variances and cXg. 
Set y = {yi, ...,ym),u = {ui, ...,Um), and e = [ei, em), and assume that u 
and e are independent. Set 9i = fi + Ui and 6 = {9i, . . . ,0^)- The purpose is 
to predict the ordered random variables 6'(j), (0(1) < 6'(2) < ••■ < ^(m)) from the 
observed y's. In SAE the random effect 9i represent the ith area parameter. 

The above model is a special case of the SAE model of Fay and Herriot 
(1979) that was presented in the context of estimating per capita income for small 
places (i.e., population less than 1,000) from the 1970 Census of Population and 
Housing. The original Fay-Herriot model allows different /ij of the form fii = x'-jS, 
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where Xi IS cl vector of covariates for area i, /? is a vector of coefficients that are 
common to all areas, Ui is a random effect of area i, and 9i = fii + Ui, the value 
of interest in area i, is measured with a sampling error e^. The SAE literature 
is concerned with the estimation of 9i; see, e.g., Rao (2003). However, it is also 
natural to consider the ordered parameters if one is interested in estimating 
jointly the best, second best, median, or worst area's parameter, for example, or 
in studying the best or worst k areas. In these cases, one is interested in many 
or all ranked parameters, and not just a single O^^y 

When we have more than one observation per area, the model is known as 
the Battese-Harter-Fuller model (1988), see also Pfeffermann (2002), which we 
again simplify as in (jl.ip : 

yij = ii + Ui + eij, J = 1, . . . , n, i = l,...,m. (1.2) 

Typically in SAE m is large, while n is small; however, we consider both small 
and large m. Taking area means, as justified by sufficiency, the latter model 
reduces to that of (jl.l|) with replaced by crl/n. When the al is unknown, 
it should be estimated. A main idea in SAE is to borrow strength across the 
different areas in order to predict effects. This can be applied also to variance 
estimation when some of the areas have only one observation; however, this is 
beyond the scope of the present paper, and for simplicity we assume the same 
number of observations n in each area. 

To see the difference between predicting the unordered vector 0, and the 
ordered vector = (6'(i), . . . , 0(m))) consider estimating the maximum 
and two natural unbiased predictors of 9i, 9i = yi or di = E{di\y). By Jensen's 
inequality := maxj 0j is an overestimate in expectation of in the case 
of 9i = yi, and an underestimate if we use 9i = E{9i\y). Such biases increase in 
m, which in SAE and in many parts of this paper is taken to be large. Similar 
considerations hold for other ordered parameters. 

With different loss functions, prediction of the ordered parameters appears 
in Wright, Stern and Cressie (2003), and prediction and ranking of small area 
parameters appear in Shen and Louis (1998). Their Bayesian methods require 
heavy numerical calculations, and are sensitive to the choice of priors; see Shen 
and Louis (2000). 

If fi and fj^ and/or fjg are known, we have a Bayesian model in ()1.1|) or 
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(|1.2p . and under quadratic loss, the optimal predictors of the ordered parameters 
would be of the form ^ (j) = £^(0(j)|y), where the expectation depends on fi, a"^ 
and cjg, (and the distribution F and G). If /i and cj^ and/or cjg are unknown, we 
adopt an Empirical Bayes approach, and estimate them from the data. However, 
even in the normal case, analytical computation of ii^(^(j)|y) seems intractable 
for m > 2, and even more so under other distributions. Numerical computations 
could be done, and in fact, this is the Bayesian approach taken in principle by 
Wright et al (2003) and Shen and Louis (1998, 2000); the precise quantities they 
compute are different due to the fact that they use different loss functions. 

In this paper we avoid such Bayesian calculations and present simple predic- 
tors whose performance is close to optimal; furthermore, due to their simplicity, 
they are more robust against model misspecification. 

Our starting point is the following. Consider the predictor 9i = E{9i\y) of 
9i; under the assumption that F and G are normal, we have 9i = 9i{fi,^*) = 
^*yi + (1 — 7*)/i, where 7* = (7^/(o"^ + Ug). For unknown /i, we plug in the 
estimator /i = y, and obtain the shrinkage- type predictor 9i{'fl) = 7*yj-|-(l — 7*)y, 
which is the best linear unbiased predictor of 9i for any F and G; see, e.g., 
Robinson (1991), Rao (2003). Here 7* determines the amount of shrinkage toward 
the mean. We discuss the required amount of shrinkage when the goal is to predict 

rather than 9i. 

For the problem of predicting the unordered parameters, Bayesian consider- 
ations as above, and Stein (1956) and the ensuing huge body of literature suggest 
shrinkage predictors. In view of the discussion on under and overestimation, it 
is not surprising that for the present problem of predicting the ordered parame- 
ters, shrinkage is also desirable, but to a lesser extent. In fact, it can be shown 
geometrically that if the coordinates of the predicting vector 6 = {9i, ... ,9m) 
happen to have the right order, that is, the same order as the coordinates of 6, 
then the desirable shrinkage is the same for the two problems, but otherwise it 
is smaller. The latter case happens with high probability for large m when the 
parameters are not very different. In this paper we show that rather satisfactory 
results can be obtained by simple predictors of the type 'jy^i) -|- (1 — 7)^, and 
study the optimal value of 7. In general we have 7* < 7. Specifically, for large 
m (m > 25, say), we propose the predictor \f^y(^i) + (1 — \/7*)y, to be denoted 
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later by ^(j|(V7*)- This predictor is easy to compute when the variances are 
either known or estimated, and performs well in comparison to Bayes predictors, 
and other numerically demanding predictors that appear in the literature. 

In most of this paper we consider some predictor 6 = {6i, . . . , 6m), take 
as a predictor of with a loss function given by 

^(^()'^()) = E (%)-%)) ' (1-3) 
1=1 

and compare different predictors in terms of the (Bayesian) risk 

r{H,0) = E{Lid^),e^))}, 

where H = {F, G) and the expectation is over all random variables involved. Note 
that by a simple rearrangement inequality we always have Yl^i {^(i) ~ ^(i)) — 

We also briefly consider the individual mean square error (MSE) of a pre- 
dictor ^(j), defined to be M5£'(%)) = - 9(^i)f. 

Even in the case of m = 2 this prediction problem is not trivial. Blumenthal 
and Cohen (1968) consider the following model: given independent observation 
Xi,X2 with Xi ~ N{9i,T'^), estimate 0(2) = max(0i,02)- They present five 
different estimators for 0(2) and evaluate their biases and mean square errors. 
Generalizing their method to more than two parameters appears to be hard. 

Finally we mention that Senn (2008), with reference to Dawid (1994) and 
others, deals with a different but related problem of estimation of the parameter 
9i* corresponding to i* = argmaxyj. In SAE, this is the parameter belonging to 
the population having the largest sample outcome, while we consider estimation 
of 6^m) 5 the parameter of the "largest population" (and likewise for other ordered 
parameters). The difference is important when m is large, and the parameters 
vary significantly; our interest is in the ordered parameters and not parameters 
chosen by the data, as in the above references. 

In Section [2] we discuss the model and present several predictors. We also 
give minimax results that provide some justification to normality assumptions 
and linear prediction. Section [3] contains the main results on properties and 
comparisons of the various predictors. The proposed class of predictors contains 
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a parameter 7. Some of the results apply to the whole class, while others suggest 
a range where the best value of 7 should be, and apply to 7 in this range. We 
describe a conjecture about the optimal value of 7 when m is large and provide 
an approximation for the optimal value of 7 in the normal case. The last part 
of Section [3] deals with a special case when F and G are normal and m = 2. In 
this part we get tighter conclusions than in the general case. 

In Section 3 we assumed that the variances in (jl.ip are known. Section 4 
deals with the case of unknown variances and studies plug-in Empirical Bayes 
predictors by simulation. In Sections 5 and 6 we study robustness of the proposed 
predictors against certain misspecifications of the assumptions on the distribu- 
tions, and compare to other predictors from the literature. 

The proofs of results concerning general m are given in Section 7. The rest 
of the proofs are given in an on-line Supplement at 

http://www.stat.sinica.edu.tw/statistica. In the Supplement we provide 
simulations for Conjectures 1 and 2, we compare various predictors under the 
assumption of known variances, and when one of the variances is unknown. The- 
orems 5 and 6 are also proved there. 

2 Predictors 

2.1 Unordered parameters 

In Sections [2] and [3] we assume that o"^ and (Tg are known. Later they are assumed 
unknown, and plug-in estimators are used. First we review some known results 
for the unordered case of Model (jl.ip and the standard problem of predicting 
6i, i = l,...,m. The best linear predictor is of the form a*y + b, with a, 6 that 
minimize the mean square error. It is easy to see that when fi is known, and 
recalling that o"^ and a'^ are now also assumed to be known, the best linear 
predictor of 9i, that is, the predictor that minimized E(9i — Oi)"^ and therefore 
r*{H, 6) := E \ }2^=i{0i — Oi)"^] among linear predictors, is 



where 7* = o"2/(o"^ + Note that the model (jl.ip does not assume normality, 
and that the best linear predictor is unbiased, that is, E[9i — Oi) = 0. When 



e,{ii) = -i*yi + (1 - 7*)^, 



(2.1) 
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both distributions F and G are normal, the best Hnear predictor above is the 
best predictor (or Bayes predictor). 

For unknown /i, the best hnear unbiased predictor (BLUP) of 9i (see, for 
example, Robinson (1991), Rao (2003) and references therein) is 



where fl :=y = 'Ym^i Vi- '^^^ BLUP property means that the predictor (j2.2p 
minimizes E{9i — Oif' among linear unbiased predictors for all F and G with 
the prescribed variances. These are shrinkage predictors (with shrinkage towards 
the mean). Such predictors appear also in Fay and Herriot (1979). We see 
in Section [3] that for the ordered parameters shrinkage is also required, but in a 
smaller amount (see also Louis (1984) and Ghosh (1992) for such shrinkage, under 
different loss functions), showing again that the related problems of predicting 
the ordered and unordered parameters, are not the same. 

A justification of normality and linearity 

Using the fact that an equalizer Bayes rule is minimax, Schwarz (1987) proved 
the following result, which in some sense justifies both linear estimators and the 
assumption of normality of F and G. 



Theorem 1. Consider ( fi. j|) with fi, o"^, and a1 all fixed and known, and the 
risk function r*{H, 6) = E ( Yl^Lii^^i ~ ^«)^ ) • predictor 6o = {Sqi, . . . , 5om) 



of 6 given by 6oi = 'y*yi + (1 — 7*)/i, i = 1, ...jm, is minimax and the normal 
strategy for H = {F, G) is least favorable. 

The next result is closely related to the previous one, and justifies linearity 
when fi is unknown, which is the case we consider. It can easily be extended to 
the original Fay-Herriot model with /Uj = x'^p. 

Theorem 2. Under the assumptions of Theorem Ul but with unknown fi, the 
predictor defined by 6oi = ^*yi + (1 — 7*)y, i = 1, rn, is minimax among all 
linear unbiased predictors of 0. 



0,(/l) =7*yi + (i-7*)y, 



(2.2) 



Proof. Let !K denote the class of pairs of distributions {F, G) having the given 
variances. Note that r*{H, 5q) depends only on the fixed variances, and therefore 
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for H £ 'K, r*{H, 60) is constant, say v. Let £ denote the class of linear unbiased 
predictors. 

We know that 60 = ((5oi, . . . , Som) is BLUP. We have 

V = inf sup r*{H, 6) < sup r*{H, 60) = r*{Ho, 60) = v, 
V= sup inf r*{H, 6) > mir*{Ho,6) =r*{Ho,6o) =v, 

for any Hq £ 'K, where the penultimate equality holds by the BLUPness of Sq. 
Since clearly V > V_, it follows that inf^g^ sup/^gj£r(//, 5) = supj|^gj£r(ff, Jq), 
so that 60 is minimax among predictors in iL as required. □ 

2.2 Ordered parameters 

Let = ii^^(0(j)|y), the best predictor of when /x is known, and consider 

its empirical or plug-in version when /i is unknown: i?^(0(j)|y) = where 
At = y- 

We consider three predictors: 

= y«' = 72/(,,) + (1 - 7)y, ^S-! = i^A(%)ly)' (2-3) 

where y^^) < ... < y^^) denote the order statistics of yi, ■■■,ym- 

Set [^)^ = (^[5'), . . . , efl^^ for A: = 1, 2, 3. The predictors in the class [^^'(7) 
are analogous to the best linear predictor for the unordered case, but as we shall 
see, the value of 7 has to be reconsidered, and q is the empirical best predictor 
in the ordered case (the best predictor with fi replaced by y). The latter predictor 
cannot in general be computed explicitly for m > 2, and some of our results are 
aimed at showing that it can be efficiently replaced by q (7) with an appropriate 
choice of 7 for the ordered case at hand. Thus q , the empirical best predictor, 
will be used as a yardstick to which other predictors are compared. 

3 Main results, known variances 

3.1 General distributions F and G and general m 

The proofs of the results of this subsection are given in the Appendix. 
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The first few results show that shrinkage-type predictors in the class 6 ^ ^ (7) 
perform better than the predictor 6 . Refined calculations of the range of the 
optimal 7 allow us to understand the amount of shrinkage required for the ordered 
parameters case. 

Theorem 3. Consider jll-l]} with the loss function ( tj.3|) and^* = c^/Ccu + o'g). 

-{2^-1) ^(27*-l)< 7 <1, (3.1) 

m — 1 m — 1 

then 

E{L{ef]{j),e^))} < E{L(e[le^))}. (3.2) 

Note that if fi^ ^ then 7* ^ and the left-hand side of (|3.ip tends to —1. 
If m ^ 00, which is of interest in SAE, then the left-hand side of (|3.ip tends to 
2y/j* — 1. The left-hand side of (13. ip is 1 when 7* = 1 and increases in 7*, hence 
is bounded by 1. 

By verifying the condition for the left-hand side of (j3.ip to be nonpositive, 
we obtain the following. 



Corollary 1. // 



then 

i?{L(0fl(7),0())}<i?{L(0W 0())} (3.4) 

for a// 7, < 7 < 1. 



Note that asymptotically (j3.3p becomes 7* < since limm^oo ^^"2 

|. Condition (|3.3p is sufficient and may not be necessary. But, (13. 4p does not 
hold without a suitable condition on 7*; for example, if m = 100, the upper 
bound of (j3.3p is 0.2475. For 7* = 1/3 and 7 = 0.1, a straightforward simulation 
using normal variables shows that (13. 4p does not hold. 

For 7 = 7*, (j3.ip holds if and only if 7* < (m — l)^/(m-|- 1)^ (see Appendix). 
From this result we obtain the following. 

Corollary 2. // 

7* < (m - l)V("i + l)^ (3.5) 
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then E{L{ef]{j),e^))}<E{L{efl,e^^)} for all 7, 7* < 7 < 1- 

Asymptotically, Corollary[2]holds for all 7* without (j3.5p . because lim.m^oo{{m- 
l)^/(m + 1)^} = 1 and < 7* < 1 by definition. A small simulation study indi- 
cates that Corollary [2] may hold without the condition 7* < (m — l)^/(m +1)^ 
for a large variety of F and G. We can prove it only for the extreme case m = 2 
and normal F and G; see Theorem [5] below. The range of 7's for which shrinkage 
improves the predictors, 7* < 7, indicates that, for the ordered problem, less 
shrinkage is required. 

The following lemma is used in the proof of Theorem (H but may be of 
independent interest. 

Lemma 1. Under M.l\j . 

m 
i=l 

^ [2] 

For the predictors 9 (7) , it is natural to look for optimal or good values of 

7- 

Theorem 4. Under ( fi.ij) . let be the optimal choice of ^ for the predictor 

^ [21 ^ [21 

^(jj(7) in the sense of minimizing £'{L(0 ^ (7), 0( ))}. Then 



7"G 



m , — 1 



7 , 7V7 77 

m — 1 m — 1 



(3.6) 



As ?Ti ^ 00, the above range for the optimal 7 becomes [7*, VT*] • 



Conjecture 1. The optimal 7 in the sense of Theorem^ satisfies limm^oo 7° = 

Simulations that justify Conjecture [3] are given in the Supplement. 

For m > 25 or so, which is common in SAE, we recommend using the 
predictor 6 ^ j (7) with 7 = VT* • Numerous simulations suggest that the latter 
choice, or the choice of 7 = 7°, yield essentially the same results. We emphasize 
that the predic 
discussed next. 



that the predictor 0q(v7^) is very easy to compute. The case of m < 25 is 
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3.2 An approximation for 7° in the normal case 

For practical computation of 7° for m < 25 or so, we propose the following 
approach that we have implemented in the normal case. (For m > 25, Vt* 
provides an excellent approximation to 7°, see simulation results and Conjecture 
[3|) . In view of Theorem H] we consider the approximation formula 

7''«a7* + (l-a)n(m,7*), (3.7) 

with u{m,"f*) = i^^^^Vt* — :^^^"/*, and a depending on m and 7*. For fixed 
7*, and for each m satisfying 2 < m < 30 we compute £'{L(0 ^ ^ (7), 0q)} by 
simulations and find the minimizer 7° by an exhaustive search. We then define 
«m,7* to be the solution of (j3.7p . For fixed 7* we carry out polynomial regression 
of the computed values of 0^,7* on the explanatory variable m in the range 
2 < m < 30; this is repeated for an array of values of 7*. It turns out that 
an excellent approximation is obtained when am,'y* = is taken to be only a 
function of m for a large range of values of 7* . We therefore combine the different 
regressions for the different values of 7*, and obtain a polynomial approximation 
for am- The numerical calculations lead to the quadratic polynomial am = 
0.8236 — 0.0573m + 0.0012m^. Plugging it into (j3.7|) we obtain the approximation 
7° = ami* + (1 - am)u{m,-f*) for 7°. 

Numerical simulations show that in the range 2 < m < 25 and for all values 
of 7*, the resulting 7° is indeed very close to 7". In fact, for ni = 2 they may 
differ by about 10%, but for m > 4 they differ by about 1% — 2%. Using one or 
the other yields almost identical expected losses. 

3.3 Normal distribution of F and G and m=2 

When both F and G are normal and m = 2, we obtain tighter conclusions for 
the previous results. 

Theorem 5. For U.l\} with F and G normal and m = 2: 

1. if < 7* < c« 0.4119, then E{L(^[^1(7),0())} < E{L(0 ^ , 6»( ))} for all 
< 7 < 1; 

2. for all -f* and-1 satisfying -f* < 7 < 1, E{L{ef.{^i),9^))} < £;{L(^[^' 
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3. the optimal 7 for the predictor 9 (i=l,2) in the sense of minimizing 



(i) 



E{L{efl{j),er.)} IS 



7° = 7* [Alicia) - 1) + (1 - 7*)-\/7*(l - 7*), (3.8) 

TT 

where tp{a) = t^^ (at) Lp(t)dt, and a = \J ■ 

Thus Part 1 of Theorem [5] shows that we can replace the condition 7* < 
« 0.086 of Corollary [J by 7* < c, c « 0.4119; Part 2 shows that ([33]) 
(7* < 1/9 for ?Ti = 2) of Corollary [2] may be omitted; Part 3 of Theorem [5] gives 
an exact result rather than the range given by (j3.6p . 

Remark. The accurate definition of c and its approximation are given in the 
proof of Theorem [5] in the Supplement. The function il^{a) can be computed by 
Matlab: double(mt(a;^ *normpdf(x) * (erf (a * x/sqrt(2)) + l)/2, x, 0, inf )). 

The results given so far compare 9^^-^ (7) with 9^^-^ in the sense of minimizing 
expected loss. In the absence of an explicit expression for ^(^^ 7 it is not easy to 
compare it with other predictors analyticallly, but it is possible to do this if F 
and G are normal and m = 2, and the result is Theorem [6l For m > 2 we provide 
simulations. 

It is obvious that the estimator 19(4) (/i) = E^[9(j:^\y) minimizes the MSE. The 
point of Theorem [6] is that the unknown /i is replaced in -i^^j) (/i) by its estimate 
y to obtain 9^^} . 

Theorem 6. Consider lll-l]) with F and G normal. Thenform = 2,E{L{6 ^y6(^^)}< 
S{L(0fl(7*),0())}. 

Conjecture 2. If F and G are normal and m > 2, then £'{L(0 q\ ^q)} < 
i^{L(^fl(7°),0())}. 

Various simulations support this conjecture. Some of them are presented in 

^ [21 

the Supplement. The simulations show that the predictor ^ (7°) is worse than 

^ 131 ^ 

Q in the sense of E{L{0(^-^, ^())}, as suggested by Conjecture 2. However, they 

are rather close, while the predictor 9 q (7*) is far worse. This suggests that the 

^ [21 

linear predictor 0^ (7°) can be used without much loss. As mentioned above, 
for m > 25 or so, the predictor 6 ^ (vT^), which is easy to calculate, is as good 

^ [21 

as 0q(7°), and the calculation of 7° can be avoided. See the Supplement. 
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4 Unknown variances 

Until now it was assumed that the variances are known. We now turn to the case 
of unknown variances. This case will be studied by simulations, whose detailed 
description is given in the Supplement. 

We first make the common assumption in SAE that only cj^ is unknown, 
and later that both variances, cj^ and Cg are unknown. We replace each un- 
known variance by plugging-in its natural estimator. For the case that only cj^ 
is unknown, it is estimated by 

52 =max|^-i^^(2/,-y)2-a2,oj . (4.1) 

This approach cannot be expected to work for small values of m. We emphasize 
again the interest in SAE is in large m's. 

The notation for the resulting estimates remains as it was for the case of 
known variances. In this case, and in the case that both variances are unknown 

(Figure 1 below), we use simulations to compare the risk £'{L(0( ), 0q)} for the 

^ [3] ^ [2] ^ [2] ^ [21 

predictors 6 , 6 ["^ (7* ) , ( ) , and 6 [ ^ (7 °) , and since in all simulations 

the risks of the latter two predictors are almost identical, we present only one 

of them. We also compare the performance of these predictors when only the 

maximum is predicted. 
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Figure 1: 

^ . ^ [21 

• Comparison of£'{-L(0(),0())} as a function of 7* , for the predictors 9 ^ ^ (7* ) , 

^ ^ [31 

6 Q (vT^), Q (dotted, solid, dashed lines), where F and G are normal and 
m = 100, n = 15 (upper left), m = 30, n = 15 (upper right) 

• Comparison of the MSE of ^^J^^^ (7* ) , 0^^^{^/Y), 9^^^ (dotted, solid, dashed 
lines) for predicting as a function of 7*, where F and G are normal 
and m = 100, n = 15 (bottom left), m = 30, n = 15 (bottom right) 

In Figure 3S (given in the Supplement) only o"^ is estimated, and in Figure 1 
both 0"^ and Cg are estimated. The figures are rather similar. The results should 
be compared to those of Figure 2S (Supplement), where the variances are known. 
Clearly the less one knows, the higher the loss. However, the simple shrinkage 
predictor ^/-. (Vt*) performed almost as well as the best plug-in predictor 



and much better than 6 ^ ^ (7* ) . Thus again we conclude that for the problem at 
hand, shrinkage estimators work, provided one uses the right amount of shrinkage 
for the ordered parameters problem. 

For the case of unknowns o",^ and fig, consider the model 

yij = fi + ui + eij]i = l,...,m,j = l,...,n, 

which is a special case of the Nested Error Unit Level Regression model of Bat- 
tese, Harter and Fuller (1988). We apply our previous estimators, replacing the 
variances by 

^ m n ^ m n 



m(n — 1) ^-^ ^-^ 

^ ^ i=i j=i i=i j=i 

2 

u 



and set 7* = — ^ . Simulation results are given Figure 1 



Shrinkage type predictor \~:{y^) in the non nor- 



mal case 



We briefly consider non-normal F, whereas the error distribution G remains 
normal. We first take the double exponential distribution (Laplace distribution) 
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for the random effects Uj , with density ^ exp ( — ] , where b 



2b ' 

calculations show that the density function of 9i given yi is 

Piit) 



fe,\y, {t\y) = < 



P2{t) 



if t < 



A* 



if t > /i 



■2^. Direct 

V 2 



(5.1) 



J!!^Pl{t)dt + f^P2{t)dt 

where = exp(^-{t- {y + {-ly+^a^b^^))^ /2af^ , f = 1,2. 



0.45 5 D 55 



0.45 0.5 0.55 



Figure 2: 

• Comparison of £'{L(0( ), 0( ))} as a function of 7*, for the predictors 6 ^ ^ (7*), 
^ () (7°)> ^ Q (dotted, solid, dashed lines), where F is the Laplace distribu- 
tion and G is normal (upper left), and where F is the Location exponential 
distribution and G is normal (upper right), for m = 100. 

• Comparison of the MSE of 0^^^{Y), ^(S)(7''), ^(S) (dotted, solid, dashed 
lines) for predicting , as a function of 7* , where F is the Laplace distribu- 
tion and G is normal (bottom left), and where F is the Location exponential 
distribution and G is normal (bottom right), for m = 100. 

We also take a location exponential distribution for the random effects , with 
density ^ exp (— ^^x^) ^{ui>a)-, where b = au, a = —b. By direct calculation the 
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density function of 8i given yi is 



J_,„+^ exp [-{t -{y- alou )) /2cj2 1 dt 



The simulations (Figure 2) were done as in Figure 2S (m=100), except that 
for each value of 7* we ran 100 simulations and generated 100 random variables 

^ [31 

from ^Q.\y^ (•!•)) sorted them, and approximated ^^^^y 

We can see in Figure 2 that for the symmetric but heavy-tailed Laplace 

distribution, our shrinkage type predictor ^^^(7°) (and the same is true for 
^ [2] ^ [3] 

Q (-y/T*) ) is close to the empirical best predictor but in the asymmetric 
case of the Location Exponential distribution, this does not happen. 



6 Robustness and comparison with Shen and Louis 
(1998) 

Shen and Louis (1998; henceforth SL) proposed predictors called "Triple-goal 
estimates" for random effects in two-stage hierarchical models. Their method is in 
general not analytically tractable, and requires numerical calculations. Moreover, 
being sensitive to Bayesian assumptions, it is not robust (Shen and Louis (2000)). 

The first stage of SL is minimizing E J {A{t; y) — Gmit)}^ dt with the con- 
straint that A is a discrete distribution with at most m mass points, where 
Gm{t) is the 'empirical' distribution function Gm{t) = ^ XlilLi -^(6»i<t)- They 
show that the solution A is the empirical distribution U = {Ui,...,Um), 
= (^), where G„,(t) = E {Gm{t)\y) = ^ EI^i ^ < Ayk)- There- 
fore Uj is a predictor of 0^), j = 1, m. The solution Uj = G^ (^i^) depends 
on the posterior distributions of 9i,....,9m and requires estimation of unknown 
parameters and a solution of nonlinear equations. In order to compute Grn{t) in 
our simulations, we compute P {9i < t\yk) using the plug-in (or moment) estima- 
tor y of fi, and (jl.ip with the assumption that F and G are normal, and apply 
Matlab function 'fzero' for the solution t = Uj of the equations Gm{t) = ^2m^ ■ 

For the purpose of checking robustness we generated data taking F to be 
the Laplace distribution or the asymmetric location exponential distribution, 
and a normal G. The simulations were done as in Figure 2S (m=100), except 
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that in the stage of prediction we ignored the true distribution of the random 
effects and used the normal distribution. Here we compared 6 q (V7*)> ^ fy 
the predictor U based on SL. Note that, unhke the estimators in SL, it is not 
necessary to know the distributions for the predictor 6 (v7*)- 




Figure 3: 

• Comparison of £'{L(0q, 0(- ))} as a function of 7*, for the predictors U, 
^f)iV¥), of] (red, black, green lines), where F is the Laplace distribution 
and G is normal (upper left ) and where F is the Location exponential 
distribution and G is normal (upper right) for, m = 100. 

^ ^ [2] ^ [3] 

• Comparison of the MSE of Um, ^(m)(V' )' ^(m) ('^^d, black, green lines) 

for predicting 6t Yf^^ , clS cL function of 7* , where F is the Laplace distribution 
and G is normal (bottom left ) and where F is the Location exponential 
distribution and G is normal (bottom right), for m = 100. 

In general, the SL estimators and 6 ^ ^ exhibited very similar performance, 
see Figure 3. Under the correct assumptions they were somewhat better than 
our predictor ^ q (VT*) (Figure 2); however, they are usually computationally 
intensive and non-robust against model misspecification. Under misspecification 
of the distributions in the model, it turned out that ^}n (\/7 ), which does not 



Ordered Random Effects 17 

depend on the assumed model was better, as can be sees from the simulations of 
Figure 3. 

7 Appendix: Proofs 

Proof of Theorem [3l Without loss of generality take /i = 0. We have 

m 

E{L{e fi(7),0{))} = + (1 - 7)y - 

i=l 

m m 

= eY, {ly(^) + (1 - i)y - y{€} + y« - = ^ 5^ (y« - 

i=l i=l 
m m 

+ (1 - 7)'i5^E - y)' - 2(1 - 7)i^ (y« - - y) ■ 

i=l i=l 

Therefore, 

m 

D{^) := E{L{e f;(7),0())} - E{L{e ^ = (1 " 1? EY,{y(i) -yf 

i=l 

m 

-2(l-7)i?^(2/(,)-%)(y(,)-y). 

i=l 

We calculate each part separately. First note that E Yl^i (2/(1) ~ y)^ = ^ Z^I^i (^i ~ V)"^ 
= {^l + ^e)im-l). Now 

m m m m m 

^ Yl (yH) - ^(O) -y) = E^yfi)- y^^y -EY^ 9^i)y(^i) + eJ2 ^(0^ 

1=1 1=1 i=l i=l 1=1 

^2^— J] ="l(^^^ + f^e)-f^e -^X]%)y«• 
(7.2) 

Summarizing the above we have 

m 

D(7) = {l-J)\al+aj){m-l)-2{l-J)(m{al+aj)-al-EY,0i^Jyi^)). (7.3) 



i=l 



From Lemma [T] (to be proved later) 



j=l 



(7.4) 
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We use the first inequality to deduce that for 7 < 1, 

D{^) < (1 - jfial + al){m - 1) - 2(1 - 7) {m{al + a^) - a', - m^al{al + <yl) 

Equating the right-hand side to zero and solving the quadratic equation in 1— 7, it 
is easy to see that D{-^) < in the interval {:;^{2^ - 1) - ^(27* - 1), l) , 



and the result follows. □ 
Proof of Corollary [H Clearly, ^{L(0 (7), < for all < 

7 < 1, if ^^"[(^VT* ~ 1) ~ ;7r^(27* — 1) !^ 0. Solving the quadratic equation in 

1/7*, we see that the latter inequality holds if either (i) 7* < ^) +^ 

Since 7* < 1 the only possibility is (i), 
and the proof is complete. □ 

'[1(7),^()) 

{)■■ 



Proof of Corollary [21 From Theorem □ it is clear that ^{L(0 [^^^(7), < 
W if ;^(2Vr - 1) - ^(27* - 1) < 7* < 7 < 1- The first 



inequality is equivalent to 7* < or 7* > 1. The case 7* = 1 is trivial 

because in this case 6^^^ = ^^.^^^(7*). □ 
Proof of Lemma[TJ The lower bound is a result of the rearrangement inequality 

m m 
i=l i=l 

The upper bound follows from 

m / m m \ ^/"^ 

i=l \i=l i=l / 

where the inequality follows from the Cauchy-Schwarz inequality. □ 
Proof of Theorem [4l By the calculations of Theorem [3l 



^;{L(0fl(7),0())} = i?E(2/W-%))' 



i=l 



+ (1 - ^)\al + al){m - 1) - 2(1 - ^)eY, {vi,) " " v) 



1=1 



Hence, dE{L{ef^{^\e(^)}/d^ = if and only if 7 = 1 - ''^^^^i%'{tT"'^ ^ 
which is a minimum by convexity. We cannot calculate the latter expression 
exactly, yet the bounds of (j7.4p imply the result readily. □ 



Ordered Random Effects 



19 



Acknowledgment We thank Danny Pfeffermann for discussions that led to the 
formulation of the problems studied in this paper. An Associate Editor and the 
referees made comments that resulted in significant improvements in the paper. 
This paper is dedicated to the memory of Gideon Schwarz, a teacher and a friend. 

This research was supported in part by grant number 473/04 from the Israel 
Science Foundation. 

References 

Battese, G. E., Harter, R. M., and Fuller, W. A. (1988). An error component 
model for predection of county crop areas using survey and satellite data. 
Journal of the American Statistical Association 83, 28-36. 

Blumenthal, S. and Cohen, A. (1968). Estimation of the larger of two normal 
means. Journal of the American Statistical Association 63, 861-876. 

David, H. A. and Nagaraja, N. H. (2003). Order Statistics (third edition). 
Wiley, New York. 

Dawid, A. P. (1994). Selection paradoxes of Bayesian inference. In Multivariate 
Analysis and its Application 24, (eds. T.W. Anderson, K. A-T. A Fang and 
I. Olkin) Philadelphia, PA:IMS. 

Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: 
An application of James-Stein procedures to census data. Journal of the 
American Statistical Association 74, 269-277 . 

Ghosh, M. (1992). Constrained Bayes estimates with application. Journal of 
the American Statistical Association 87, 533-540. 

Kella, O. (1986). On the distribution of the maximum of bivariate normal 
random variables with general means and variances. Commun. Statist- 
Theory Meth. 15, 3265-76. 



Louis, T. A. (1984). Estimating a population of parameter values using Bayes 
and empirical Bayes methods. Journal of the American Statistical Associ- 
ation 79, 393-398. 



20 



YAAKOV MALINOVSKY and YOSEF RINOTT 



Pfeffermann, D. (2002). Small area estimation- new developments and direc- 
tions. International Statistical Review 70, 125-143. 

Rao, J. N. K. (2003). Small Area Estimation. Wiley, New York. 

Robinson, G. K. (1991). That BLUP is a good thing: the estimation of random 
effects. Statistical Science 6, 15-32. 

Rinott, Y. and Samuel-Cahn, E. (1994). Covariance between variables and 
their order statistics for multivariate normal variables. Statist. Probab. 
Lett. 21, 153-155. 

Senn, S. (2008). A Note concerning a selection Paradox of Dawid's. The Amer- 
ican Statistician 62, 206-210. 

Shen, W. and Louis, T. A. (1998). Triple-goal estimates in two-stage hierarchical 
models. Journal of the Royal Statistical Society B 60, 455-471. 

Shen, W. and Louis, T. A. (2000). Triple-Goal estimates for Disease Mapping. 
Statistics in Medicine 19, 2295-2308. 

Schwarz, G. (1987). A minimax property of linear regression. Journal of the 
American Statistical Association 82, 220. 

Siegel, A. F. (1993). A surprising covariance involving the minimum of multi- 
variate normal variables. Journal of the American Statistical Association 
88, 77-80. 

Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a mul- 
tivariate normal distribution. Proc. Third Berkeley Symp. Math. Statist. 
Probability 1 , 197-206. University of California Press, Berkeley, CA. 

Wright, D. L., Stern, H. S., and Cressie, N. (2003). Loss function for estimation 
of extreme with an application to disease mapping. The Canadian Journal 
of Statistics 31, 251-266. 

first author affiliation 
E-mail: (msyakov@mscc.huji.ac.il) 
second author affiliation 
E-mail: (rinott@mscc.huji.ac.il) 



Ordered Random Effects 



21 



A Supplement to Prediction of Ordered Random Effects in a Simple Small Area Model 



In this Supplement we provide some of the simulations and technical proofs. 
Equations in this Supplement are indicated by S, e.g., (3.15*), and similarly, 
lemmas that appear only in the Supplement are numbered with S, e.g., Lemma 
IS. Equations, lemmas, and Theorems without S, refer to the article itself. 
Most of the notation is defined in the article, and this Supplement cannot be 
read independently. 

8 Simulations for Conjecture [3] 

Conjecture 3. The optimal 7 in the sense of Theorem 4, j", satisfies 



We justify Conjecture [3] by simulations. First we consider the case that both 

the area random effect Ui and the sampling error Cj have a normal distributions, 

and take m = 5,10,20,100, and then repeat the simulation with Ci having a 

translated exponential distribution. The red lines in Figure IS are the range 

(3.6) of optimal 7 from Theorem 4 and the blue line is the optimal 7, both as 

functions of 7*. The simulations were done as follows: we set cr,fj = 1. Different 

values of fig define the different values of 7*. Setting without loss of generality 

^ = 0, we generated yi = O+Uj+ej, i = 1, . . . ,m. For each value of 7* we ran 1,000 

simulations. By suitably averaging over these simulations, we then approximated 

E{L{0 () (7)) ^())} each 7 € [0, 1] using an exhaustive search with step-size of 

^ [2] 

0.001 and found 7°, the value of 7 that minimizes -E{L(0 U (7), ^q)}. 



m- 




m-5 



m-10 



m-5 



m-10 
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Figure IS: 7° (the optimal 7) as a function of 7* (blue line) and the range of 
optimal 7 from Theorem 4 as a function of 7* (red lines) when: 

1. Both the area random effect Ui and the sampling error Cj are normal (left 
four graphs). 

2. The area random effects Ui are normal, but the sampling errors Cj are from 
a location exponential distribution (an exponential distribution translated 
by a constant) (right four graphs). 



9 Simulations for Conjecture 2, and comparison of 
predictors 

9.1 Known variances 

^ [31 

For normal F and G, Conjecture 2 says that the predictor 0^.-^ is better than 

^ [2] ^ [31 

9^.-^ (7) for all values of 7 (including the optimal) in the sense that E{L{6 q , 0( ))} < 

^{L(0['J(7),0())}. Recah that ^{L(0 ['^(7°), ^q)} < ^{L(0 ['^(7), 0())} for all 

7- 

The simulations below support Conjecture 2. Figure 2S shows a sample 
of simulation results for m = 30 and 100. We compare the expected loss in 
predicting 0( ^ hj 6 q (7°) to that of 6 q . While doing these simulations, we also 

compared the expected loss in predicting ^(^^ by ^(•,„)(7°) to that of 

^ [2] 

The simulations show that the expected losses of the predictors ^ q (7°) and 

^ [31 ^ [2] 

Q are rather close, while the predictor 9 q (7*) is far worse. This suggests that 

^ [2] 

the linear predictor 6^ (7°) can be used without much loss. It is important to 
note that given 7° this estimator is easy to calculate. For large m one may take 
7° = Vt*, whereas for small m, the approximation of Section 3.2 can be used. 

The simulation was done as follows: we set cr^ = 1. Different values of dg 
define the different values of 7*. Setting /x = 0, we generated = + Uj + e,, 

1 = l,..,m. For each value of 7* we ran 1,000 simulations and approximated 

^ [2] 

S{L(0[)'(7),0())} for each 7 in the range (3.6). Using an exhaustive search 

^ [21 

with step-size of 0.001 we found 7°, the minimizer of £'{L(0 q (7), 0q)}. We 

^ [31 

approximated 6,-. in the following way: when both F and G are normal, Oi\yi ~ 
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N ['j*yi + (1 — 7*)/^5 7*o'e) . Hence, for each yj, i = 1, ...,m, we generated 1,000 

random variables from iV (7*7/4 + (1 — 7*)y, 7*cjg), sorted them, and approxi- 

^ [3] ^ [3] 

mated d^.y We approximated £'{L(0 q , 0q)} in the same way as we approxi- 

mated E{L{efhY),0^))}. 




Figure 2S: 

• Comparison oi E{L{0^-^, ^())} as a function of 7*, for the predictors ^ (7*), 

^ [21 ^ [31 

6'()(7°)> (red, black, green lines), where F and G are normal and 
m = 100 (upper left ), m = 30 (upper right) 

• Comparison of the MSE of S^J^'^ (7* ) , ^([^')(7°), 0^^^ (red, black, green lines) 
for predicting as a function of 7*, where F and G are normal and 

m = 100 (bottom left), m = 30 (bottom right) 

9.2 Unknown variances 

Figure 35 compares the risks when only o"^ is unknown and its estimator (4.1) 
is plugged-in. Otherwise, the simulations are similar to those of the previous 
section. The case that both variances, and are unknown is considered in 
Section 4 in the article. 
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Figure 3S: 

^ ^ \2] 

• Comparison of £^{L(0q, ^())} as a function of 7*, for the predictors 6 q (7*), 

^'()Hvr^), ^'q (red, black, green lines), where F and G are normal and 
m = 100 (upper left ), m = 30 (upper right) 

^ [2] ^ [2] ^ [3] 

• Comparison of the MSE of ^(J„)(7*), ^(m)(V / )> ^(m) (red, black, green 
lines) for predicting as a function of 7*, where F and G are normal 
and m = 100 (bottom left), m = 30 (bottom right) 



10 Proofs 

10.1 Proof of Theorem 5 

For the proof of Theorem 5 we need some further lemmas. In the sequel, I denotes 
an indicator function, and (p and $ denote the standard normal density and cdf. 
Lemma IS. Set %l){a) := t^^ {at) ip{t)dt, Qi{a) = \ + + \)l{a> I) , 
and Q2{a) = il (a = 0) + (| + |^) I (O < a < f ) + il (a > f ) . Then 

Qi{o) ^ i^io,) ^ Q2{o) for all a > 0, with equalities for a = 0, a = 1. 

Proof. Note that V'(^) = t^^ {at) ip{t)dt is increasing in a, and thus for 
< a < 00, we have 1/4 = ■0(0) ^ V'(^) ^ ip{oo) = 1/2. A simple calculation 
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shows that = | + + |), and the lower bound follows. 

The upper bound follows readily once we show that for a > 0, il^{a) < 
(i ~^ latter inequality only for < a < ^ since for a > vr/2, 1/2 

is a better upper bound. (In fact 1/2 is a good bound since for a > 1, that 
iPia) > V(l) = i + (if + I) « 0.4546.) 

To show V'(o) !^ (I + j^) for a > we compute Taylor's expansion around 
a = 1, 

$(at) = <^{t) + tv3(t)(a - 1) - '^ip{a*t){a - if, 
with a* between 1 and a. It follows that 

^{at) < <^>{t) + tip{t){a - 1), for t > and a > 0. 

Therefore, 

/■oo poo roo 

ip{a)= t^^{at)ip{t)dt< t'^^{t)ip{t)dt + (a-l) t^ip'^{t)dt 
Jo Jo Jo 

1 3\ a-1 /3 a\ ^ „ 

h - H = - H for all a >0. □ 

47r Sy 47r V8 47ry ^ 

Lemma2S. Zei Z ~ iV(0, 1). Then2Qi{a)-\ < E {\Z\Z^ {aZ)) < 2Q2{a)-\. 
Equalities hold when a = or a = 1. 

Proof. 

/oo pco pO 

\t\t^ {at) (p{t)dt = / t^^ {at) ip{t)dt - / t^^ {at) ip{t)dt 
-oo Jo J —oo 

j-OO 1 1 

= 2/ t2^(at)¥;(t)dt- - = 2V'(a) - -. (3.15) 
Jo 2 2 

The result now follows from Lemma IS. □ 



Lemma 3S. For Model (1.1) with F and G normal, m = 2, and /i = 0, 

2 

E (^(2)^(2)) < '2^l02{a) + ^V7*(l-7*) 
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and 

E (%)y{2)) > 2^2^! (a) + ^V7*(l-7*), 
where a = ^/ ^Z^* ■ 

Proof. Kella (1986) (see also David and Nagaraja 2003) shows that 

(0(.)|y) = $ (A) /.I + CD (-A) /i2 + (A) , (3.25) 

where A = 7*^^^, = 7*o"e> = 7*yj> i = 1, 2. Therefore, 

^(%)y(2)) = E{y^2)E{e^2) |y)) = ^ (y{2) (A) ^ + $ (- A) /.2 + ^1^299 (A 



(3.35) 

We now calculate the latter three terms. For the first we use the relation y(2) = 
+ fa^. We have 

The penultimate equality follows from the fact that for iid normal variables yi, 
yi — 112 and yi + 2/2 are independent, and the last equality holds because for ^ = 
we have Eiyj) = 0. The substitution Z = and standard calculations 

show that the last term above equals 



,2 



'^-^E \Z\^ J^^Z] Z 



^* V Y Y 1 — ^* 

where Z is a standard normal random variable. 



Let a = Y ilj* ■ Using (3. IS) we obtain 
E (y(2)$ (7*^^) (yi -2/2)) = ^Ei\Z\^iaZ)Z) = ^(2^(a) - 1/2). 
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To calculate the second term of (3.3S) we use a result from Siegel (1993), see 
also Rinott and Samuel- Cahn (1994). It yields the second equality below, while 
the others are straightforward: 

E (^(2)2/2) = Cov{y2, y(2)) = Cov{y2, y2)P{y2 = y{2))+Cov{y2, yi)P{y2 = y(i)) = 
The third part of (3.3S) is computed like the second part above to give 



E[ym^[l*'-^))=J^E{\ZW{aZ)). 



The latter expectation becomes 



\t\ip{at)ip{t)dt = 2 I tip{at)ip{t)dt -- ^ ^ 







TT 



Combining these results, we get 



E (^(2)2/(2)) = 2al^P{a) + x/7*(l - 7*)- (3-45) 
From (3.4S) and Lemma IS, Lemma 3S follows readily. □ 

Proof of Theorem 5. It is easy to see that we can assume /i = without 
loss of generality. We use the calculations of Theorem 3. Lemma 3S is used 
instead of Lemma 1 for a better upper bound of E (^(i)?/(i) + 6 [2)11 {2)) the 
normal case and m = 2. 

Below we use the notation of Lemma 1. By symmetry E^O^i-^y^i^) = E{9[2)y{2))- 



Therefore, by Lemma 3S with a = J , , we have £'(0(i)y(i) + ^(2)2/(2)) < 



4(7^^2 (ti) + 2^-\/7*(l — 7*). By (7.2) and the above inequality we obtain 

2 2 

E (y(o - ^{i))iy(i) -y) = + ^e) -'^I-eY, 

i=l 1=1 

> 2{al + al) - al - Aale2{a) - 2^ 

vr 

= 2al - AalQ2{a) + al {l - ^V7*(l-7*)) := '^(7*)- 
Recall from the proof of Theorem 3 the notation 

D(7) := E{L{e fl(7),0())} - E{L{e W^q)}. 
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In order to prove part 1 of Theorem 5, we have to show that its conditions imply 
Z)(7) < 0. 

By (7.3) for m = 2, 

2 

D{j) = (1 - jf{al + al) - 2(1 - ^)E ^(y(,) - 0(,))(y(,) - y) 

i=l 

< (1 - ^)\al + al) - 2(1 - 7)^(7*) = (1 - 7)[(1 - + ^^l) - Ml*)]- 

We assume < 7 < 1 and therefore D{^) < provided 7 > 1 — 2 '^^^ 2 =: uj{'^*). 
For 7* = (a = 0), i-o'(7*) = —1 and clearly D(7) < for all 7. 

Next we show that in the range < 7* < —f, ~ 0.71 < a < — the 

vr^ + 4 V 2 / 

function u;(7*) has a single zero at c w 0.4119, and < for 7* < c. This 

implies that 7 > u;(7*) and therefore -D(7) < 0. 

In this range of 7*, a;(7*) = I+47* (^ - i) -2(1-7*) (l " |V7*(l-7*))- 
2 

Substituting 7* = we get u;(7*) = 1 + (f - - 2 + . The 

function w(7*) has the same zeros as the function P{a) := §(1 + a^)^w(7*) and 

straightforward calculations show that P{a) = + — ^0? + 2a — ^, and that 

this function is increasing in a and therefore in 7*. By numerical calculation we 

obtain that it vanishes at c ~ 0.4119. 

The second part of Theorem 5 is proved by showing that 7* > ^(7*) and 

therefore 1 > 7 > 7* implies 7 > u;(7*). 

2 

In the range < 7* < ^^^^ ^ 0.71 (o < a < ^) , ^^(7*) = 1+47* (If " 3) " 

2(1 - 7*) (1 - |^7*(l-7*)) < 1 - 2(1 - 7*)^ . Therefore, uj{^*) - 7* < 
2^(1-7*) <0. 

2 

In the range 7* > , uj{^*) = 1 - 2(1 - 7*) (l - fV7*(l-7*)) < 

1 - 2(1 - 7*)^ . Therefore, ^(7*) - 7* < 2_z(l - 7*) < 0. 

For the proof the last part we use the same calculation as in Theorem 4 with 
m=2 to obtain 



a^{L(0^ J(7),0())}/(97 = if and only if 7=1 ' ^ 1 2^ — 
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By (7.2) we have 

2 2 

i=l i=l 

By (3.4S) we have E Ylti = + 2#V7*(l-7*)- Hence, 

^ E - ^(0)(y(0 - y) = (1 - 2V(a)) + - ^V7*(l-7*)) 

i=i ^ ^ 

Finally, using the convexity of E{L{9 '^'(7),^)}, the optimal 7 is 



7° = 7* (#(a) - 1) + (1 - 7*)-\/7*(l - 7*)- □ 

TT 



10.2 Proof of Theorem 6 

Note that ^([51(7) = (1 - l)y + jgi{y), and from (3.2S) 0^^^ = = 
(1 - 7*)y + 7*/j(y), where and gi{y) are functions of y = (1/1,2/2) defined 

for z = 1, 2 by 

/. . /.(.) ^ (-!)• (* (A, - + ^ V2, (A)) + ... . . 

. *yi-y2 2 * 2 

A = 7^, a =7 a,. 



We have 
E 



'(1) - = + {e ((0(,) - ^[5l)|y)) 

Far (%)|y) + ((1 - 7*) (/i - y))^ = Far (%)|y) + ((1 - 7*)^)' , 



where the last equality holds because for m=2 we can assume that = without 
loss of generality. In the same way, 

E ((^(1?(7) - = Var {6^,)\y) + (^E ((%) - O^^miy))' 

= Var (^(,)|y) + ((1 - 7*)/i + 7*/. - (1 - l)y - 19^f = Var {e^)\y) + (7*/^ - (1 - 7)^ " IQi? 
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Therefore, 

d(7) := E{L{e f],e^))} - E{L{e fl(7),0())} 

= 2E ((1 - 7*)y)' - E (7*/i - 7ffi - (1 - l)yf - E (7V2 - 752 - (1 - i)yf 
= 2 ((1 - 7*)2 - (1 - 7)2) E{y^)-E (7*/i - IQi? - E (7V2 - 192? 
+ 2(1 - 7)ii;[ ((7*(/i + /a) - 7(51 + 52)) {y)] . 

From the definitions of fi and gi it follows that /i + /2 — 9i — 52 = 0, and the 
last term vanishes. It is now easy to see that (i(7*) < 0. □ 



